Portable Compilation of Vector Expressions for Architectures with Memory Hierarchy

نویسندگان

  • A. Kalinov
  • A. Lastovetsky
  • M. Posypkin
چکیده

The paper presents a scheme of code generation for vector expressions implemented in the CC] compiler (CC] is a vector ANSI C superset aimed at vector and superscalar architectures). The scheme is based on two well-known optimization techniques { loop invariant code motion and iteration space tiling. The problem of nding the optimal tile size for the imperfectly nested loop system implementing a vector expression is addressed. Some experimental results demonstrating eeciency of the code generation scheme are also presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating QDP++ using GPUs

Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of their GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domai...

متن کامل

Compilation and Simulation Tool Chain for Memory Aware Energy Optimizations

Memory hierarchies are known to be the energy bottleneck of portable embedded devices. Numerous memory aware energy optimizations have been proposed. However, both the optimization and the validation is performed in an ad-hoc manner as a coherent compilation and simulation framework does not exist as yet. In this paper, we present such a framework for performing memory hierarchy aware energy op...

متن کامل

Compilation techniques for parallel systems

Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and eeectively exploiting parallelism at various levels of granularity. We begin by describing the program analysis techniques through w...

متن کامل

Two Fundamental Limits on Dataflow Multiprocessing

This paper examines the argument for dataflow architectures in “Two Fundamental Issues in Multiprocessing[5].” We observe two key problems. First, the justification of extensive multithreading is based on an overly simplistic view of the storage hierarchy. Second, the local greedy scheduling policy embodied in dataflow is inadequate in many circumstances. A more realistic model of the storage h...

متن کامل

GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs

Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999